Alignment Tools for Parallel Treebanks
نویسندگان
چکیده
This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignment tools. Our constituent structure treebanks contain just over 1,000 sentences and around 18,000 tokens in each language.
منابع مشابه
A Search Tool for Parallel Treebanks
This paper describes a tool for aligning and searching parallel treebanks. Such treebanks are a new type of parallel corpora that come with syntactic annotation on both languages plus sub-sentential alignment. Our tool allows the visualization of tree pairs and the comfortable annotation of word and phrase alignments. It also allows monolingual and bilingual searches including the specification...
متن کاملUnsupervised Generation of Parallel Treebanks through Sub-Tree Alignment
e need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. is is true especially for parallel treebanks, of which very few exist. e ones that exist are mainly hand-craed and too small for reliable use in data-oriented applications. In this paper we introduce an open-source system for fast and robust automatic generation of para...
متن کاملXML-based Phrase Alignment in Parallel Treebanks
This paper describes the usage of XML for representing cross-language phrase alignments in parallel treebanks. We have developed a TreeAligner as a tool for interactively inserting and correcting such alignments as an independent level of treebank annotation.
متن کاملDivergences in English-Hindi Parallel Dependency Treebanks
We present, here, our analysis of systematic divergences in parallel EnglishHindi dependency treebanks based on the Computational Paninian Grammar (CPG) framework. Study of structural divergences in parallel treebanks not only helps in developing larger treebanks automatically, but can also be useful for many NLP applications such as data-driven machine translation (MT) systems. Given that the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007